Skip to content

Conversation

@yudongsi
Copy link
Contributor

@yudongsi yudongsi commented Oct 8, 2024

No description provided.

@yudongsi yudongsi linked an issue Oct 8, 2024 that may be closed by this pull request
@whitneywhtsang
Copy link
Contributor

@ESI-SYD Performance from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11230502839 is 240TFlops, the reported performance was 262TFlops, can you investigate why?
There are a lot of printouts like below, can they be cleaned up?

 problem size: (3072,3072), tiled_shape: (12,12), tiles: 144, dp_tiles: 96, sk_tiles: 48, iters_per_tile: 128, num_workgroups: 128, dp_workgroups: 96, dp_waves: 3, sk_groups_per_region: 32, sk_regions: 1, sk_waves: 1, sk_iters_per_normal_group: 192, sk_big_groups_per_region: 0, avail_xecores: 32

Local range: {1, 8, 4} 
SK Score: 51

@yudongsi
Copy link
Contributor Author

yudongsi commented Oct 9, 2024

@ESI-SYD Performance from https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11230502839 is 240TFlops, the reported performance was 262TFlops, can you investigate why? There are a lot of printouts like below, can they be cleaned up?

 problem size: (3072,3072), tiled_shape: (12,12), tiles: 144, dp_tiles: 96, sk_tiles: 48, iters_per_tile: 128, num_workgroups: 128, dp_workgroups: 96, dp_waves: 3, sk_groups_per_region: 32, sk_regions: 1, sk_waves: 1, sk_iters_per_normal_group: 192, sk_big_groups_per_region: 0, avail_xecores: 32

Local range: {1, 8, 4} 
SK Score: 51

Let me check, previously I can get ~252 locally

@yudongsi yudongsi marked this pull request as draft October 9, 2024 03:10
@yudongsi yudongsi force-pushed the yudong/xetla_streamk branch from 9e46570 to 9965f3e Compare October 10, 2024 05:25
@yudongsi
Copy link
Contributor Author

Try to disable prints intel/xetla#54

@yudongsi yudongsi force-pushed the yudong/xetla_streamk branch from 9965f3e to 1efcc81 Compare October 15, 2024 02:54
@yudongsi yudongsi marked this pull request as ready for review October 15, 2024 02:54
@whitneywhtsang
Copy link
Contributor

What has changed? Is performance good now? Print disabled?

@yudongsi
Copy link
Contributor Author

What has changed? Is performance good now? Print disabled?

xecores increased, 251.6 now, 96% , print disable change not landed (looks like not activate in their public repo).

https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336/job/31531618981

@yudongsi
Copy link
Contributor Author

Note: Pre-commit checks failure not releated to this PR.

@whitneywhtsang
Copy link
Contributor

In https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336, I see stream-k performance is only 103TFlops.

@yudongsi
Copy link
Contributor Author

In https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336, I see stream-k performance is only 103TFlops.

        M       K       N   Triton-GB/s    XeTLA-GB/s  Triton-GB/s-min  XeTLA-GB/s-min  Triton-GB/s-max  XeTLA-GB/s-max  Triton-TFlops  XeTLA-TFlops  Triton-TFlops-min  XeTLA-TFlops-min  Triton-TFlops-max  XeTLA-TFlops-max  Triton-CV  XeTLA-CV
0  3072.0  4096.0  3072.0  5.033170e+07  5.033177e+07     5.033169e+07    5.033173e+07     5.033170e+07    5.033179e+07     103.746892    251.641847          93.622131        168.239498         110.720402        280.920817   0.054009  0.174175

@whitneywhtsang
Copy link
Contributor

In https://github.com/intel/intel-xpu-backend-for-triton/actions/runs/11338429336, I see stream-k performance is only 103TFlops.

        M       K       N   Triton-GB/s    XeTLA-GB/s  Triton-GB/s-min  XeTLA-GB/s-min  Triton-GB/s-max  XeTLA-GB/s-max  Triton-TFlops  XeTLA-TFlops  Triton-TFlops-min  XeTLA-TFlops-min  Triton-TFlops-max  XeTLA-TFlops-max  Triton-CV  XeTLA-CV
0  3072.0  4096.0  3072.0  5.033170e+07  5.033177e+07     5.033169e+07    5.033173e+07     5.033170e+07    5.033179e+07     103.746892    251.641847          93.622131        168.239498         110.720402        280.920817   0.054009  0.174175

Opps, I was looking at the wrong column.

Copy link
Contributor

@whitneywhtsang whitneywhtsang left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please create an issue to track the removal of the prints.

@yudongsi
Copy link
Contributor Author

Please create an issue to track the removal of the prints.

#2489

@etiotto
Copy link
Contributor

etiotto commented Oct 15, 2024

@ESI-SYD reminder, the pre-commit is failing.

@whitneywhtsang whitneywhtsang merged commit 6018c7b into main Oct 15, 2024
5 checks passed
@whitneywhtsang whitneywhtsang deleted the yudong/xetla_streamk branch October 15, 2024 19:52
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[XeTLA] Add StreamK and SplitK implementation

5 participants